In January 2025, devastating wildfires swept through Los Angeles, becoming some of the most destructive in California’s history. (TIME Magazine, 2025) reported that climate change has “exacerbated the size, intensity, and damage caused by wildfires,” and experts noted that “the key question for living with wildfire is how we as humans manage the risks”.
In this lab, you will analyze real California Wildfire data to investigate these claims. You will explore historical patterns, examine relationships between climate variables and fire outcomes, and develop data-driven insights about wildfire risk management.
By the end of this lab, you will be able to:
# Install packages if needed (uncomment if necessary)
# install.packages(c("tidyverse", "ggplot2", "dplyr", "readr", "scales",
# "viridis", "ggrepel", "corrplot", "GGally",
# "ggridges", "plotly", "RColorBrewer", "patchwork"))
# Load required libraries
library(tidyverse)
library(ggplot2)
library(dplyr)
library(readr)
library(scales)
library(viridis)
library(ggrepel)
library(corrplot)
library(GGally)
library(ggridges)
library(plotly)
library(RColorBrewer)
library(patchwork)
# Set custom theme for all plots
theme_wildfire <- theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14, hjust = 0, color = "#2C3E50"),
plot.subtitle = element_text(color = "#7F8C8D", size = 11),
plot.caption = element_text(color = "#95A5A6", size = 9, hjust = 1),
axis.title = element_text(face = "bold", size = 11),
axis.text = element_text(size = 10),
panel.grid.minor = element_blank(),
panel.grid.major = element_line(color = "#ECF0F1"),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
strip.text = element_text(face = "bold", size = 11)
)
# Set as default theme
theme_set(theme_wildfire)
# Define color palettes
fire_colors <- c("#FF6B35", "#D32F2F", "#FFA726", "#8B0000", "#FFD700")
drought_colors <- c("Abnormally Dry" = "#FFEDA0",
"Moderate" = "#FEB24C",
"Severe" = "#F03B20",
"Extreme" = "#BD0026",
"Exceptional" = "#800026")
region_colors <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00",
"#FFFF33", "#A65628", "#F781BF", "#999999", "#66C2A5")
This section prepares you for the visualizations in this lab. If you’re new to R or ggplot2, work through this warm-up before starting Part 1.
Keep these references handy as you work through the lab:
| Resource | Description | Link |
|---|---|---|
| ggplot2 Cheatsheet | Quick reference for all ggplot2 functions | https://ggplot2.tidyverse.org |
| R Graph Gallery | Visual examples with code | https://r-graph-gallery.com |
| R Programming 101 | Video tutorials for beginners | https://youtube.com/playlist?list=PLtL57Fdbwb_C6RS0JtBojTNOMVlgpeJkS |
ggplot2 builds plots by adding layers using the
+ operator. Think of it like stacking transparent
sheets:
ggplot(data, aes(x, y)) + # Layer 1: Canvas + coordinate mapping
geom_point() + # Layer 2: Add points
labs(title = "My Plot") + # Layer 3: Add labels
theme_minimal() # Layer 4: Apply styling
Key components:
| Component | Purpose | Example |
|---|---|---|
ggplot() |
Initialize the plot with data | ggplot(data = my_data) |
aes() |
Map variables to visual properties | aes(x = year, y = acres) |
geom_*() |
Specify the type of plot | geom_bar(), geom_line(),
geom_point() |
labs() |
Add titles and labels | labs(title = "Fire Trends") |
theme_*() |
Control appearance | theme_minimal() |
Before you start coding, familiarize yourself with these frequent mistakes:
| Error | What You Wrote | What You Should Write |
|---|---|---|
Missing + between layers |
ggplot(df, aes(x, y)) geom_point() |
ggplot(df, aes(x, y)) + geom_point() |
Forgetting closing ) |
aes(x = year, y = acres |
aes(x = year, y = acres) |
Missing comma in aes() |
aes(x = year y = acres) |
aes(x = year, y = acres) |
| Variable name typo | aes(x = Year) when column is year |
aes(x = year) — R is case-sensitive! |
Using = instead of == in filter |
filter(region = "North") |
filter(region == "North") |
| Quotes around variable names | aes(x = "year") |
aes(x = year) — no quotes for variables |
| Plus sign at line start | ggplot(df, aes(x, y))+ geom_point() |
ggplot(df, aes(x, y)) +geom_point() |
Tip: If you get an error, read it carefully! R often tells you exactly what’s wrong (e.g., “unexpected symbol” usually means a missing comma or parenthesis).
Practice these exercises using the built-in mtcars
dataset before tackling the wildfire data.
Run this code to create a simple scatter plot:
# This is provided for you - just run it!
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
geom_point()
What to notice: The aes() maps
wt (weight) to the x-axis and mpg (miles per
gallon) to the y-axis.
Complete this code to create a bar chart showing the count of cars by
number of cylinders (cyl):
# Fill in the blank: what variable goes on the x-axis?
ggplot(data = mtcars, aes(x = _____)) +
geom_bar()
ggplot(data = mtcars, aes(x = cyl)) +
geom_bar()
Modify this scatter plot to color points by the number of cylinders
(cyl):
# Add color inside aes() to map cyl to point color
ggplot(data = mtcars, aes(x = wt, y = mpg, _____ = _____)) +
geom_point(size = 3)
ggplot(data = mtcars, aes(x = wt, y = mpg, color = cyl)) +
geom_point(size = 3)
Use filter() to show only 6-cylinder cars, then
plot:
# Filter for 6-cylinder cars, then create a scatter plot
mtcars %>%
filter(cyl _____ _____) %>%
ggplot(aes(x = wt, y = mpg)) +
geom_point()
mtcars %>%
filter(cyl == 6) %>%
ggplot(aes(x = wt, y = mpg)) +
geom_point()
Put it all together! Fill in the blanks to filter, plot, and label:
mtcars %>%
filter(hp > _____) %>% # Cars with more than 150 horsepower
ggplot(aes(x = _____, y = _____, color = cyl)) + # Plot weight vs mpg
geom_point(size = 3) +
labs(
title = "_____", # Add a descriptive title
x = "Weight (1000 lbs)",
y = "Miles per Gallon"
)
mtcars %>%
filter(hp > 150) %>%
ggplot(aes(x = wt, y = mpg, color = cyl)) +
geom_point(size = 3) +
labs(
title = "Fuel Efficiency of High-Horsepower Cars",
x = "Weight (1000 lbs)",
y = "Miles per Gallon"
)
You’ll encounter these plot types throughout the lab:
| geom | Use For | Example |
|---|---|---|
geom_bar() |
Counting categories | Number of fires by drought level |
geom_col() |
Bars with pre-calculated values | Acres burned by year |
geom_line() |
Trends over time | Temperature changes 2001-2024 |
geom_point() |
Scatter plots | Acres vs. temperature |
geom_boxplot() |
Distribution comparison | Acres burned by region |
geom_histogram() |
Single variable distribution | Distribution of fire sizes |
geom_smooth() |
Trend lines | Adding regression to scatter plots |
Ready? Once you’ve completed the warm-up exercises, proceed to Part 1!
The TIME Magazine article states: “The key question for living with wildfire is how we as humans manage the risks.” Before exploring the data, what do you think are the main factors that contribute to wildfire risk?
SOLUTION:
Import the statewide wildfire dataset
(ca_wildfires_statewide_2001_2024.csv). Use
dim() and str() to explore its structure. How
many years of data are included?
# Import the statewide time series data
statewide <- read.csv("ca_wildfires_statewide_2001_2024.csv", stringsAsFactors = FALSE)
# Check dimensions
dim(statewide)
## [1] 24 7
# Examine structure
str(statewide)
## 'data.frame': 24 obs. of 7 variables:
## $ year : int 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 ...
## $ fire_count : int 217 254 364 286 322 332 357 474 265 210 ...
## $ acres_burned : num 5413 10228 20083 5832 5002 ...
## $ avg_temp_f : num 57.9 57.6 58.1 57.8 57.5 ...
## $ tree_loss_acres : int 52189 68565 61289 66683 47175 86057 136817 214406 114220 35661 ...
## $ carbon_emissions: int 35641331 39092238 33221508 40625035 38657789 48874782 63318420 111709151 58619291 33145575 ...
## $ drought_level : chr "Abnormally Dry" "Moderate" "Abnormally Dry" "Moderate" ...
# Preview data
head(statewide)
SOLUTION:
The dataset contains 24 rows (years from 2001-2024) and 7 columns (variables). This represents 24 years of California Wildfire data.
Using the data dictionary, match each variable to its
definition. What does tree_loss_acres measure and what
units is it in?
SOLUTION:
| Variable | Definition | Units |
|---|---|---|
year |
Calendar year | Numeric (2001-2024) |
fire_count |
Number of fires statewide | Count |
acres_burned |
Total acres burned statewide | Acres |
avg_temp_f |
State average temperature | Degrees Fahrenheit |
tree_loss_acres |
Tree cover loss from all causes | Acres |
carbon_emissions |
CO₂ released from all tree cover loss | Megagrams (Mg) |
drought_level |
Drought severity category | Categorical |
tree_loss_acres measures the total tree
cover loss in acres (1 hectare = 2.47 acres). This
represents the forest area lost from all causes including wildfires,
logging, agriculture, natural disturbances, and development.
Important Note: The tree_loss_acres and
carbon_emissions variables represent all tree cover
loss in California, not just wildfire-related loss. According
to Global Forest Watch methodology, tree cover loss includes
stand-replacing wildfires, agricultural conversion, logging operations,
natural disturbances (storms, flooding, insect damage), and urban
development.
Calculate summary statistics for acres_burned
using summary(). What is the median acres burned per
year?
# Calculate summary statistics
summary(statewide$acres_burned)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2969 6712 12902 17336 21647 73235
# Additional statistics
cat("\n--- Additional Statistics ---\n")
##
## --- Additional Statistics ---
cat("Mean:", round(mean(statewide$acres_burned), 2), "acres\n")
## Mean: 17336.42 acres
cat("Median:", round(median(statewide$acres_burned), 2), "acres\n")
## Median: 12901.65 acres
cat("Standard Deviation:", round(sd(statewide$acres_burned), 2), "acres\n")
## Standard Deviation: 15874.62 acres
cat("Range:", round(max(statewide$acres_burned) - min(statewide$acres_burned), 2), "acres\n")
## Range: 70266.2 acres
SOLUTION:
The median acres burned per year is approximately 12,901.65 acres. Note that the mean (17,336.42 acres) is considerably higher than the median, indicating the distribution is right-skewed—a few extreme fire years pull the average upward.
The drought_level variable is categorical. Use
table() to count how many years fall into each drought
category. Which drought level is most common?
# Count years by drought level
drought_table <- table(statewide$drought_level)
print(drought_table)
##
## Abnormally Dry Exceptional Extreme Moderate Severe
## 10 2 3 6 3
# Calculate proportions
prop.table(drought_table) * 100
##
## Abnormally Dry Exceptional Extreme Moderate Severe
## 41.666667 8.333333 12.500000 25.000000 12.500000
# Create a modern horizontal bar chart
drought_summary <- statewide %>%
count(drought_level) %>%
mutate(drought_level = factor(drought_level,
levels = c("Abnormally Dry", "Moderate", "Severe",
"Extreme", "Exceptional")))
ggplot(drought_summary, aes(x = reorder(drought_level, n), y = n, fill = drought_level)) +
geom_col(width = 0.7, show.legend = FALSE) +
geom_text(aes(label = n), hjust = -0.3, fontface = "bold", size = 4) +
coord_flip() +
scale_fill_manual(values = c("Abnormally Dry" = "#FFEDA0",
"Moderate" = "#FEB24C",
"Severe" = "#FC4E2A",
"Extreme" = "#E31A1C",
"Exceptional" = "#800026")) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Distribution of Drought Levels in California (2001-2024)",
subtitle = "Number of years at each drought severity level",
x = NULL,
y = "Number of Years",
caption = "Source: U.S. Drought Monitor Data 2001-2024"
)
SOLUTION:
“Abnormally Dry” is the most common drought level, occurring in 10 out of 24 years (41.67%). The distribution shows:
Now import the county-level dataset
(ca_wildfires_county_2019_2023.csv). How many counties and
regions are in California based on this data?
# Import county-level data
county_data <- read.csv("ca_wildfires_county_2019_2023.csv", stringsAsFactors = FALSE)
# Check dimensions
dim(county_data)
## [1] 58 17
# Number of counties
n_counties <- nrow(county_data)
cat("Number of California counties:", n_counties, "\n")
## Number of California counties: 58
n_regions <- length(unique(county_data$region))
cat("Number of California regions:", n_regions, "\n")
## Number of California regions: 10
# Preview
head(county_data)
SOLUTION:
California has 58 counties and 10 regions represented in this dataset. This matches the actual number of counties in California.
region variable that groups
California’s 58 counties into 10 geographic regions. This map shows how
counties are organized:
California Census 2020 Regions. Source: California Department of Finance
Examine the data dictionary for the county dataset. What is the Wildland-Urban Interface (WUI)? Why is this concept important for understanding wildfire risk?
SOLUTION:
The Wildland-Urban Interface (WUI) is defined as the zone where human development meets or intermingles with wildland vegetation.
California Wildland–Urban Interface (WUI), 2020. Source: SILVIS Lab (University of Wisconsin–Madison) and U.S. Forest Service. Data derived from 2020 U.S. Census block geography and the 2019 National Land Cover Database (NLCD).
According to the data dictionary:
acres_wui: Total land area classified
as WUI, measured in acresprop_wui: Proportion of total county
land area classified as WUIWhy WUI is important for wildfire risk:
The TIME article specifically mentions that “wildfires will continue to cause devastation as long as areas that were previously natural vegetation are commercially developed” — this is precisely the WUI expansion issue.
Note on homes_damaged: This variable
counts residential structures that sustained greater than 50%
damage from wildfires, meaning homes that were destroyed or
damaged beyond half their value. This threshold ensures we are measuring
significant structural damage rather than minor fire effects.
Notice that San Francisco shows 0 acres burned and 0 homes damaged. Based on the note that “Only fires grown to 10 acres or more are recorded,” why might a densely urban area like San Francisco have no wildfire records?
# Extract San Francisco data
sf_data <- county_data %>% filter(county == "San Francisco")
print(sf_data)
## county region population median_income gini_index
## 1 San Francisco Los Angeles County 836321 141446 0.5181
## prop_65_older drought_level acres_burned prop_area_burned fires_human
## 1 0.1717 Moderate 0 0 0
## fires_natural fires_other homes_damaged acres_wui prop_wui carbon_emissions
## 1 0 0 0 847.31 0.0284 13141
## tree_loss_acres
## 1 21
# Compare with other urban areas
urban_comparison <- county_data %>%
filter(county %in% c("San Francisco", "Los Angeles", "San Diego", "Santa Cruz")) %>%
select(county, population, acres_burned, homes_damaged, prop_wui, acres_wui)
print(urban_comparison)
## county population acres_burned homes_damaged prop_wui acres_wui
## 1 Los Angeles 9848406 41584.73 363 0.3572 928268.57
## 2 San Diego 3282782 7138.23 104 0.4497 1211902.58
## 3 San Francisco 836321 0.00 0 0.0284 847.31
## 4 Santa Cruz 266021 21302.57 1431 0.7977 227245.03
SOLUTION:
San Francisco has zero wildfire records for several interconnected reasons:
Extreme urban density: San Francisco (47 sq mi) is one of the most densely populated cities in the US with ~836,000 residents. There is virtually no wildland vegetation to sustain a 10+ acre fire.
Minimal WUI: San Francisco has only 847 acres of WUI (the smallest in the state) and only 2.84% of its land is classified as WUI. Compare this to Santa Cruz County at 79.77% WUI.
10-acre reporting threshold: The data only includes fires that grew to 10 acres or more. Any small fires in San Francisco would be quickly contained by the dense urban infrastructure and fire department before reaching this threshold.
Geographic constraints: The city is surrounded by water on three sides (Pacific Ocean, San Francisco Bay), limiting fire spread potential.
Fire suppression infrastructure: Dense urban areas have extensive fire hydrant networks, fire stations, and rapid response capabilities.
Important statistical note: Zero does NOT mean “no fire risk” — it means no fires reached the recording threshold. This is an example of how data collection methods affect what we observe.
Create a time series line plot showing
fire_count from 2001-2024. Add proper axis labels and a
descriptive title. Based on the plot, describe the overall trend in the
number of fires.
Instructions: Fill in the blanks
(_____) to complete the visualization code. Each step is
labeled to guide you.
# STEP 1: Create the base plot
# Hint: The statewide dataset contains yearly data. Map year to x-axis, fire_count to y-axis.
ggplot(data = _____, aes(x = _____, y = _____)) +
# STEP 2: Add line geometry (color and linewidth provided)
geom_line(color = "#D32F2F", linewidth = 1.2) +
# STEP 3: Add point geometry (provided)
geom_point(color = "#D32F2F", size = 3) +
# STEP 4: Add smoothed trend line
# Hint: Use method = "loess" for local polynomial regression
geom_smooth(method = "_____", se = TRUE, alpha = 0.2, color = "#1976D2", linetype = "dashed") +
# STEP 5: Set x-axis breaks (provided)
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
# STEP 6: Format y-axis with commas (provided)
scale_y_continuous(labels = comma) +
# STEP 7: Add labels
# Hint: Write a title that describes what the plot shows
labs(
title = "_____",
subtitle = "Number of fires reaching 10+ acres per year with trend line",
x = "Year",
y = "Number of Fires",
caption = "Source: CALFIRE Data | Dashed line shows LOESS smoothed trend"
)
ggplot(data = statewide, aes(x = year, y = fire_count)) +
geom_line(color = "#D32F2F", linewidth = 1.2) +
geom_point(color = "#D32F2F", size = 3) +
geom_smooth(method = "loess", se = TRUE, alpha = 0.2, color = "#1976D2", linetype = "dashed") +
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
scale_y_continuous(labels = comma) +
labs(
title = "Annual Wildfire Count in California (2001-2024)",
subtitle = "Number of fires reaching 10+ acres per year with trend line",
x = "Year",
y = "Number of Fires",
caption = "Source: CALFIRE Data | Dashed line shows LOESS smoothed trend"
)
SOLUTION:
The fire count shows high variability with no clear linear trend, but some notable patterns:
The trend appears cyclical rather than consistently increasing. Notable spikes occurred in 2008, 2017, and 2020. The most recent years (2022-2024) show continued elevated fire counts compared to earlier decades.
Now create a similar time series for
acres_burned. Are the patterns similar to fire count? What
year had the most acres burned?
Instructions: Fill in the blanks to complete the code.
# STEP 1: Find the peak year for annotation
# Hint: Filter where acres_burned equals the maximum value
peak_year <- statewide %>% filter(acres_burned == _____(acres_burned))
# STEP 2: Create the base plot
# Hint: Same dataset, but now y = acres_burned
ggplot(data = _____, aes(x = _____, y = _____)) +
# STEP 3: Add line geometry (provided)
geom_line(color = "#FF6B35", linewidth = 1.2) +
# STEP 4: Add points (provided)
geom_point(color = "#FF6B35", size = 3) +
# STEP 5: Highlight the peak year with a different point shape (provided)
geom_point(data = peak_year, aes(x = year, y = acres_burned),
color = "#8B0000", size = 5, shape = 18) +
# STEP 6: Add text label for peak year (provided)
geom_text(data = peak_year, aes(label = paste0(year, "\n", comma(round(acres_burned)), " acres")),
vjust = -0.8, hjust = 0.5, fontface = "bold", size = 3.5, color = "#8B0000") +
# STEP 7: Format axes (provided)
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
scale_y_continuous(labels = comma) +
# STEP 8: Add labels (provided)
labs(
title = "Total Acres Burned by Wildfires in California (2001-2024)",
subtitle = "Peak year highlighted | Fire severity shows increasing extremes",
x = "Year",
y = "Acres Burned",
caption = "Source: CALFIRE Data"
)
peak_year <- statewide %>% filter(acres_burned == max(acres_burned))
ggplot(data = statewide, aes(x = year, y = acres_burned)) +
geom_line(color = "#FF6B35", linewidth = 1.2) +
geom_point(color = "#FF6B35", size = 3) +
geom_point(data = peak_year, aes(x = year, y = acres_burned),
color = "#8B0000", size = 5, shape = 18) +
geom_text(data = peak_year, aes(label = paste0(year, "\n", comma(round(acres_burned)), " acres")),
vjust = -0.8, hjust = 0.5, fontface = "bold", size = 3.5, color = "#8B0000") +
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
scale_y_continuous(labels = comma) +
labs(
title = "Total Acres Burned by Wildfires in California (2001-2024)",
subtitle = "Peak year highlighted | Fire severity shows increasing extremes",
x = "Year",
y = "Acres Burned",
caption = "Source: CALFIRE Data"
)
SOLUTION:
2020 had the most acres burned with approximately 73,235 acres.
The patterns differ significantly from fire count:
The TIME article claims wildfires are becoming “larger.” Calculate the mean acres burned for the first half (2001-2012) vs. second half (2013-2024) of the dataset. Does this support the claim?
# Split data into halves
first_half <- statewide %>% filter(year <= 2012)
second_half <- statewide %>% filter(year > 2012)
# Calculate statistics for each half
comparison <- data.frame(
Period = c("2001-2012 (First Half)", "2013-2024 (Second Half)"),
Mean_Acres = c(mean(first_half$acres_burned), mean(second_half$acres_burned)),
Median_Acres = c(median(first_half$acres_burned), median(second_half$acres_burned)),
Max_Acres = c(max(first_half$acres_burned), max(second_half$acres_burned)),
Total_Acres = c(sum(first_half$acres_burned), sum(second_half$acres_burned))
)
print(comparison)
## Period Mean_Acres Median_Acres Max_Acres Total_Acres
## 1 2001-2012 (First Half) 12141.13 10297.43 26589.35 145693.5
## 2 2013-2024 (Second Half) 22531.71 14341.78 73235.29 270380.5
# Calculate percent increase
pct_increase_mean <- (comparison$Mean_Acres[2] - comparison$Mean_Acres[1]) / comparison$Mean_Acres[1] * 100
cat("\nPercent increase in mean acres burned:", round(pct_increase_mean, 1), "%\n")
##
## Percent increase in mean acres burned: 85.6 %
SOLUTION:
The data supports the claim that wildfires are becoming larger:
This is consistent with the TIME article’s claim that climate change has “exacerbated the size” of wildfires.
Using the same first-half (2001-2012) and second-half (2013-2024) split, calculate the difference in mean temperature between the two periods. Is this temperature change significant in the context of climate science?
# Calculate mean temperature for each period
temp_first_half <- mean(first_half$avg_temp_f, na.rm = TRUE)
temp_second_half <- mean(second_half$avg_temp_f, na.rm = TRUE)
# Calculate the difference
temp_difference <- temp_second_half - temp_first_half
cat("Mean Temperature (2001-2012):", round(temp_first_half, 2), "°F\n")
## Mean Temperature (2001-2012): 57.49 °F
cat("Mean Temperature (2013-2024):", round(temp_second_half, 2), "°F\n")
## Mean Temperature (2013-2024): 58.87 °F
cat("Temperature Increase:", round(temp_difference, 2), "°F\n")
## Temperature Increase: 1.38 °F
SOLUTION:
\[ \begin{array}{ccc} \text{1.38}^\circ \mathrm{F} \approx \text{0.77}^\circ \mathrm{C}\text{ Increase} & \Longrightarrow & \text{0.64}^\circ \mathrm{C}\,/\,\text{decade} & \Longrightarrow & \text{111% Burn Area Increase per }^\circ \mathrm{C} \\[6pt] \end{array} \]
The mean temperature increased by approximately 1.38°F (~0.77°C) between the two 12-year periods.
Is this significant? Yes, this is a substantial change in climate science terms:
Rate of change (°C): An increase of 0.77 °C over a 12-year period corresponds to a warming rate of approximately 0.64 °C per decade.
Global Context: This is 3.5 times faster than the global average warming rate of approximately 0.18 °C (0.32 °F) per decade observed since 1981.
Amplification Effect: According to Abatzoglou & Williams (2016) in Proceedings of the National Academy of Sciences (PNAS), each 1 °C of warming can increase wildfire area burned by 100–600% in the western U.S. due to drier vegetation and increased vapor pressure deficit.
Validating with our data: From Question 11, we found acres burned increased by ≈85.6% while temperature rose ≈0.77 °C. This yields an observed rate of ≈111% per °C (85.6% ÷ 0.77 °C), which falls precisely within the PNAS predicted range.
Key insight: While 1.38°F may seem small in everyday terms (you wouldn’t notice it on a thermometer), in climate systems it represents a significant shift that cascades through ecosystems.
Create a dual-axis or faceted plot showing both fire count and acres burned over time. What relationship do you observe between the number of fires and total area burned?
Instructions: Fill in the blanks to reshape data and create faceted visualization.
# STEP 1: Reshape data from wide to long format for faceting
# Hint: Select year, fire_count, and acres_burned columns
statewide_long <- statewide %>%
select(year, _____, _____) %>%
# STEP 2: Pivot to long format
# Hint: The cols argument specifies which columns to pivot
pivot_longer(cols = c(_____, _____),
names_to = "metric",
values_to = "value") %>%
# STEP 3: Create readable labels (provided)
mutate(metric = case_when(
metric == "fire_count" ~ "Number of Fires",
metric == "acres_burned" ~ "Acres Burned"
))
# STEP 4: Create faceted plot
# Hint: Map year to x, value to y, color by metric
ggplot(statewide_long, aes(x = _____, y = _____, color = _____)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2.5) +
# STEP 5: Create facets - one panel per metric
# Hint: Use scales = "free_y" so each panel has its own y-axis scale
facet_wrap(~metric, ncol = 1, scales = "_____") +
# STEP 6: Format axes and add labels (provided)
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
scale_y_continuous(labels = comma) +
scale_color_manual(values = c("Number of Fires" = "#1976D2", "Acres Burned" = "#D32F2F")) +
labs(
title = "California Wildfire Trends: Count vs. Severity (2001-2024)",
subtitle = "Fire count and acres burned show different patterns over time",
x = "Year",
y = NULL,
caption = "Source: CALFIRE Data"
) +
theme(legend.position = "none",
strip.background = element_rect(fill = "#ECF0F1", color = NA))
statewide_long <- statewide %>%
select(year, fire_count, acres_burned) %>%
pivot_longer(cols = c(fire_count, acres_burned),
names_to = "metric",
values_to = "value") %>%
mutate(metric = case_when(
metric == "fire_count" ~ "Number of Fires",
metric == "acres_burned" ~ "Acres Burned"
))
ggplot(statewide_long, aes(x = year, y = value, color = metric)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2.5) +
facet_wrap(~metric, ncol = 1, scales = "free_y") +
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
scale_y_continuous(labels = comma) +
scale_color_manual(values = c("Number of Fires" = "#1976D2", "Acres Burned" = "#D32F2F")) +
labs(
title = "California Wildfire Trends: Count vs. Severity (2001-2024)",
subtitle = "Fire count and acres burned show different patterns over time",
x = "Year",
y = NULL,
caption = "Source: CALFIRE Data"
) +
theme(legend.position = "none",
strip.background = element_rect(fill = "#ECF0F1", color = NA))
SOLUTION:
The relationship between fire count and acres burned is moderate but not perfectly correlated:
The article mentions climate change has extended the fire season. While we can’t directly measure season length, we can examine year-to-year variability. Calculate the coefficient of variation (CV = sd/mean × 100) for acres burned. What does high variability suggest?
# Calculate coefficient of variation
cv_acres <- (sd(statewide$acres_burned) / mean(statewide$acres_burned)) * 100
cv_firecount <- (sd(statewide$fire_count) / mean(statewide$fire_count)) * 100
cat("Coefficient of Variation (CV) for Acres Burned:", round(cv_acres, 1), "%\n")
## Coefficient of Variation (CV) for Acres Burned: 91.6 %
cat("Coefficient of Variation (CV) for Fire Count:", round(cv_firecount, 1), "%\n")
## Coefficient of Variation (CV) for Fire Count: 30.5 %
SOLUTION:
The coefficient of variation (CV) tells us how spread out the data is relative to its average value. When the spread is small compared to the average, outcomes are more predictable. When the spread is large compared to the average, outcomes are more unpredictable.
The coefficient of variation for acres burned is approximately 91.6%, which is very high. This means the standard deviation is nearly as large as the mean.
Interpretation:
Create a lollipop chart showing the top 10 most destructive fire years by acres burned. Which 3 years had the most devastating fires?
Instructions: Fill in the blanks to filter data and create the lollipop chart.
# STEP 1: Get top 10 years by acres burned
# Hint: Use arrange() with desc() to sort descending, then head() to get top 10
top10_years <- statewide %>%
arrange(_____(acres_burned)) %>%
_____(10) %>%
mutate(year = factor(year))
# STEP 2: Create the lollipop chart
# Hint: x = reordered year by acres_burned, y = acres_burned
ggplot(top10_years, aes(x = reorder(_____, _____), y = _____)) +
# STEP 3: Add the "stick" part of the lollipop
# Hint: geom_segment draws lines from y=0 to y=acres_burned
geom_segment(aes(xend = year, y = 0, yend = acres_burned),
color = "#D32F2F", linewidth = 1.2) +
# STEP 4: Add the "candy" part (point at the top)
geom_point(color = "#8B0000", size = 5) +
# STEP 5: Add value labels (provided)
geom_text(aes(label = comma(round(acres_burned))),
hjust = -0.2, size = 3.5, fontface = "bold") +
# STEP 6: Flip coordinates for horizontal display
coord_flip() +
# STEP 7: Format and label (provided)
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Top 10 Most Destructive Wildfire Years in California",
subtitle = "Ranked by total acres burned | 2020 stands out as an extreme outlier",
x = "Year",
y = "Acres Burned",
caption = "Source: CALFIRE Data 2001-2024"
)
top10_years <- statewide %>%
arrange(desc(acres_burned)) %>%
head(10) %>%
mutate(year = factor(year))
ggplot(top10_years, aes(x = reorder(year, acres_burned), y = acres_burned)) +
geom_segment(aes(xend = year, y = 0, yend = acres_burned),
color = "#D32F2F", linewidth = 1.2) +
geom_point(color = "#8B0000", size = 5) +
geom_text(aes(label = comma(round(acres_burned))),
hjust = -0.2, size = 3.5, fontface = "bold") +
coord_flip() +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Top 10 Most Destructive Wildfire Years in California",
subtitle = "Ranked by total acres burned | 2020 stands out as an extreme outlier",
x = "Year",
y = "Acres Burned",
caption = "Source: CALFIRE Data 2001-2024"
)
SOLUTION:
The three most devastating fire years were:
Notably, 6 of the top 10 worst fire years occurred in the second half of the dataset (2013-2024), further supporting the claim that fires are becoming more severe.
Using the county dataset, calculate the total acres burned across all 58 counties. Create a horizontal bar chart showing the top 10 counties by total acres burned.
Instructions: Fill in the blanks to create the bar chart.
# STEP 1: Get top 10 counties by acres burned
# Hint: Similar to previous question - arrange descending, take top 10
top10_counties_acres <- county_data %>%
arrange(_____(acres_burned)) %>%
_____(10)
# STEP 2: Create horizontal bar chart
# Hint: x = reordered county, y = acres_burned, fill = acres_burned for color gradient
ggplot(top10_counties_acres, aes(x = reorder(county, acres_burned), y = acres_burned, fill = _____)) +
# STEP 3: Add bars (provided)
geom_col(width = 0.7) +
# STEP 4: Add value labels (provided)
geom_text(aes(label = comma(round(acres_burned))),
hjust = -0.1, size = 3.5, fontface = "bold") +
# STEP 5: Flip to horizontal
coord_flip() +
# STEP 6: Use inferno color palette
# Hint: viridis has options: "viridis", "magma", "plasma", "inferno", "cividis"
scale_fill_viridis(option = "_____", direction = -1, guide = "none") +
# STEP 7: Format and label (provided)
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Top 10 California Counties by Acres Burned (2019-2023)",
subtitle = "Plumas County experienced the most fire damage by area",
x = NULL,
y = "Total Acres Burned",
caption = "Source: CALFIRE Data"
)
top10_counties_acres <- county_data %>%
arrange(desc(acres_burned)) %>%
head(10)
ggplot(top10_counties_acres, aes(x = reorder(county, acres_burned), y = acres_burned, fill = acres_burned)) +
geom_col(width = 0.7) +
geom_text(aes(label = comma(round(acres_burned))),
hjust = -0.1, size = 3.5, fontface = "bold") +
coord_flip() +
scale_fill_viridis(option = "inferno", direction = -1, guide = "none") +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Top 10 California Counties by Acres Burned (2019-2023)",
subtitle = "Plumas County experienced the most fire damage by area",
x = NULL,
y = "Total Acres Burned",
caption = "Source: CALFIRE Data"
)
SOLUTION:
The top 10 counties by acres burned are predominantly in Northern California and the Sierra Nevada region:
These are largely rural, forested counties with extensive wildland areas.
Create a similar chart for homes damaged. Are the counties with the most acres burned the same as those with the most homes damaged? Name any counties that appear in one top 10 but not the other.
Note: The homes_damaged variable counts
residential structures that sustained greater than 50%
damage from wildfires—meaning homes that were destroyed or
significantly damaged beyond half their value.
# Top 10 counties by homes damaged
top10_counties_homes <- county_data %>%
arrange(desc(homes_damaged)) %>%
head(10)
ggplot(top10_counties_homes, aes(x = reorder(county, homes_damaged), y = homes_damaged, fill = homes_damaged)) +
geom_col(width = 0.7) +
geom_text(aes(label = comma(homes_damaged)),
hjust = -0.1, size = 3.5, fontface = "bold") +
coord_flip() +
scale_fill_viridis(option = "magma", direction = -1, guide = "none") +
scale_y_continuous(labels = comma, expand = expansion(mult = c(0, 0.15))) +
labs(
title = "Top 10 California Counties by Homes Damaged (2019-2023)",
subtitle = "Butte County leads due to the devastating Camp Fire aftermath",
x = NULL,
y = "Homes Damaged or Destroyed (>50% damage)",
caption = "Source: CALFIRE Data"
)
# Compare the two lists
acres_top10 <- top10_counties_acres$county
homes_top10 <- top10_counties_homes$county
# Counties in acres top 10 but not homes top 10
only_acres <- setdiff(acres_top10, homes_top10)
cat("Counties in top 10 for ACRES but not HOMES:", paste(only_acres, collapse = ", "), "\n")
## Counties in top 10 for ACRES but not HOMES: Trinity, Tehama, Tulare, Lassen, Del Norte, Los Angeles
# Counties in homes top 10 but not acres top 10
only_homes <- setdiff(homes_top10, acres_top10)
cat("Counties in top 10 for HOMES but not ACRES:", paste(only_homes, collapse = ", "), "\n")
## Counties in top 10 for HOMES but not ACRES: Butte, Santa Cruz, Napa, Sonoma, Solano, Shasta
SOLUTION:
The lists are substantially different, highlighting a critical insight:
Counties in both top 10 lists: Plumas, Siskiyou, Butte, Shasta
In acres top 10 but NOT homes top 10: Trinity,
Tehama, Tulare, Del Norte, Fresno, Lassen
In homes top 10 but NOT acres top 10: Santa Cruz, Napa,
El Dorado, Sonoma, Solano, Los Angeles
Key insight: Large areas burned in remote, unpopulated counties may cause fewer home losses than smaller fires in densely populated WUI areas. Where fires occur matters as much as how large they are. Remember that homes damaged refers to structures with >50% damage, so these represent significant losses.
The TIME article emphasizes that many people “living in the
west are living in areas prone to wildfire.” Calculate the correlation
between prop_wui (proportion living in WUI) and
homes_damaged. Interpret this value.
# Calculate correlation
cor_wui_homes <- cor(county_data$prop_wui, county_data$homes_damaged, use = "complete.obs")
cat("Correlation between prop_wui and homes_damaged:", round(cor_wui_homes, 3), "\n")
## Correlation between prop_wui and homes_damaged: 0.332
# Test significance
cor_test <- cor.test(county_data$prop_wui, county_data$homes_damaged)
print(cor_test)
##
## Pearson's product-moment correlation
##
## data: county_data$prop_wui and county_data$homes_damaged
## t = 2.6323, df = 56, p-value = 0.01094
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08041755 0.54353170
## sample estimates:
## cor
## 0.3318209
SOLUTION:
The correlation coefficient is approximately 0.33, indicating a weak positive relationship between the proportion of a county in the WUI and homes damaged (>50% damage).
Interpretation:
Create a scatter plot with prop_wui on the
x-axis and homes_damaged on the y-axis. Add county labels
for the 5 counties with the most home damage. What pattern do you
observe?
# Identify top 5 counties for labeling
top5_homes <- county_data %>%
arrange(desc(homes_damaged)) %>%
head(5)
ggplot(county_data, aes(x = prop_wui, y = homes_damaged)) +
geom_point(aes(size = population/1000000), alpha = 0.6, color = "#D32F2F") +
geom_smooth(method = "lm", se = TRUE, color = "#1976D2", linetype = "dashed") +
geom_label_repel(data = top5_homes,
aes(label = county),
size = 3.5,
fontface = "bold",
box.padding = 0.5,
point.padding = 0.3,
segment.color = "gray50",
max.overlaps = 20) +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_y_continuous(labels = comma) +
scale_size_continuous(name = "Population (millions)", range = c(2, 10)) +
labs(
title = "Wildland-Urban Interface and Home Damage by County",
subtitle = "Counties with most home damage labeled | Point size indicates population",
x = "Proportion of County in WUI",
y = "Homes Damaged or Destroyed (>50% damage)",
caption = "Source: CAL FIRE (2001–2024) and U.S. Census Bureau, ACS 5-Year Estimates (2023)"
) +
theme(legend.position = "right")
SOLUTION:
Observed patterns:
Policy implication: WUI exposure is a risk factor, but other variables (fire behavior, evacuation success, building codes) strongly influence outcomes.
The TIME article states: “Climate change has exacerbated the
size, intensity, and damage caused by wildfires.” We’ll investigate the
relationship between temperature and fire outcomes. First, create a
scatter plot of avg_temp_f vs. acres_burned
with a trend line using geom_smooth().
Instructions: Fill in the blanks to create the scatter plot.
# STEP 1: Convert drought_level to ordered factor for proper color mapping
# (This code is provided and should run first)
statewide$drought_level <- factor(
statewide$drought_level,
levels = c(
"None",
"Abnormally Dry",
"Moderate",
"Severe",
"Extreme",
"Exceptional"
),
ordered = TRUE
)
# STEP 2: Create scatter plot
# Hint: x = avg_temp_f, y = acres_burned
ggplot(statewide, aes(x = _____, y = _____)) +
# STEP 3: Add points colored by drought level
# Hint: Put color = drought_level inside aes()
geom_point(aes(color = _____), size = 4, alpha = 0.8) +
# STEP 4: Add linear trend line
# Hint: method = "lm" for linear model
geom_smooth(method = "_____", se = TRUE, color = "#1976D2", linetype = "dashed") +
# STEP 5: Apply custom colors for drought levels (provided)
scale_color_manual(values = drought_colors, name = "Drought Level") +
scale_y_continuous(labels = comma) +
labs(
title = "Relationship Between Temperature and Fire Severity",
subtitle = "Higher temperatures tend to correlate with more acres burned",
x = "Average Annual Temperature (F)",
y = "Acres Burned",
caption = "Sources: CAL FIRE (2001-2024), NOAA, and National Drought Mitigation Center"
) +
theme(legend.position = "bottom")
ggplot(statewide, aes(x = avg_temp_f, y = acres_burned)) +
geom_point(aes(color = drought_level), size = 4, alpha = 0.8) +
geom_smooth(method = "lm", se = TRUE, color = "#1976D2", linetype = "dashed") +
scale_color_manual(values = drought_colors, name = "Drought Level") +
scale_y_continuous(labels = comma) +
labs(
title = "Relationship Between Temperature and Fire Severity",
subtitle = "Higher temperatures tend to correlate with more acres burned",
x = "Average Annual Temperature (F)",
y = "Acres Burned",
caption = "Sources: CAL FIRE (2001-2024), NOAA, and National Drought Mitigation Center"
) +
theme(legend.position = "bottom")
SOLUTION:
The scatter plot reveals a positive relationship between temperature and acres burned. Warmer years tend to experience more severe fire seasons. The points are colored by drought level, showing that the most extreme fire years often coincide with drought conditions.
Calculate the Pearson correlation coefficient between
avg_temp_f and acres_burned. Is the
relationship positive or negative? Is it strong or weak?
# Calculate Pearson correlation
cor_temp_acres <- cor(statewide$avg_temp_f, statewide$acres_burned, use = "complete.obs")
cat("Pearson correlation (temp vs acres):", round(cor_temp_acres, 3), "\n")
## Pearson correlation (temp vs acres): 0.494
# Formal test
cor_test_temp <- cor.test(statewide$avg_temp_f, statewide$acres_burned)
print(cor_test_temp)
##
## Pearson's product-moment correlation
##
## data: statewide$avg_temp_f and statewide$acres_burned
## t = 2.662, df = 22, p-value = 0.01424
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1126125 0.7480403
## sample estimates:
## cor
## 0.4935868
Interpretation Guide:
|r| < 0.3: Weak
0.3 ≤ |r| < 0.5: Moderate
0.5 ≤ |r| < 0.7: Strong
|r| ≥ 0.7: Very Strong
SOLUTION:
The Pearson correlation coefficient is approximately 0.49, indicating a moderate positive relationship.
This supports the claim that warming temperatures are associated with increased fire severity.
Now examine the relationship between temperature and
tree_loss_acres. Create a scatter plot with a loess
smoother. Describe the relationship.
ggplot(statewide, aes(x = avg_temp_f, y = tree_loss_acres)) +
geom_point(color = "#2E7D32", size = 4, alpha = 0.7) +
geom_smooth(method = "loess", se = TRUE, color = "#8B4513", span = 0.75) +
scale_y_continuous(labels = comma) +
labs(
title = "Temperature and Tree Cover Loss in California (2001-2024)",
subtitle = "LOESS smoother reveals non-linear relationship with threshold around 58°F",
x = "Average Annual Temperature (°F)",
y = "Tree Cover Loss (acres)",
caption = "Sources: NOAA and Global Forest Watch"
)
SOLUTION:
The relationship between temperature and tree loss appears non-linear:
The LOESS smoother captures this pattern better than a linear fit would.
Note: Tree cover loss includes all causes (wildfires, logging, agriculture, natural disturbances), not just fire-related loss.
Create a correlation matrix heatmap for all numeric variables in the statewide dataset. Which two variables have the strongest positive correlation?
# Select numeric variables
numeric_vars <- statewide %>%
select(year,fire_count, acres_burned, avg_temp_f, tree_loss_acres, carbon_emissions)
# Calculate correlation matrix
cor_matrix <- cor(numeric_vars, use = "complete.obs")
# Better visualization with corrplot
corrplot(cor_matrix,
method = "color",
type = "upper",
order = "hclust",
addCoef.col = "black",
number.cex = 0.8,
tl.col = "black",
tl.srt = 45,
col = colorRampPalette(c("#2166AC", "#F7F7F7", "#B2182B"))(100),
title = "Correlation Matrix: California Wildfire Variables",
mar = c(0, 0, 2, 0))
SOLUTION:
The strongest positive correlation is between
tree_loss_acres and
carbon_emissions with r ≈
0.99.
Interpretation:
acres_burned & tree_loss_acres (r ≈
0.81)acres_burned & carbon_emissions (r ≈
0.75)year shows positive correlations with several
variables, indicating that wildfire-related impacts have generally
increased over time.The article discusses how drought conditions affect fire
severity. Create side-by-side boxplots comparing
acres_burned across different drought_level
categories.
Instructions: Fill in the blanks to create the boxplot visualization.
# STEP 1: Ensure drought levels are ordered by severity
statewide <- statewide %>%
mutate(drought_level = factor(drought_level,
levels = c("Abnormally Dry", "Moderate", "Severe",
"Extreme", "Exceptional")))
# STEP 2: Create boxplot
# Hint: x = drought_level, y = acres_burned, fill = drought_level
ggplot(statewide, aes(x = _____, y = _____, fill = _____)) +
# STEP 3: Add boxplot geometry
geom_boxplot(alpha = 0.8, outlier.shape = 21, outlier.size = 3) +
# STEP 4: Add individual points with jitter
geom_jitter(width = 0.1, alpha = 0.5, size = 2) +
# STEP 5: Apply drought color palette (provided)
scale_fill_manual(values = drought_colors, guide = "none") +
scale_y_continuous(labels = comma) +
labs(
title = "Fire Severity by Drought Level in California",
subtitle = "Boxplots show distribution of acres burned for each drought category",
x = "Drought Level",
y = "Acres Burned",
caption = "Source: National Drought Mitigation Center Data 2001-2024"
) +
theme(axis.text.x = element_text(angle = 15, hjust = 1))
statewide <- statewide %>%
mutate(drought_level = factor(drought_level,
levels = c("Abnormally Dry", "Moderate", "Severe",
"Extreme", "Exceptional")))
ggplot(statewide, aes(x = drought_level, y = acres_burned, fill = drought_level)) +
geom_boxplot(alpha = 0.8, outlier.shape = 21, outlier.size = 3) +
geom_jitter(width = 0.1, alpha = 0.5, size = 2) +
scale_fill_manual(values = drought_colors, guide = "none") +
scale_y_continuous(labels = comma) +
labs(
title = "Fire Severity by Drought Level in California",
subtitle = "Boxplots show distribution of acres burned for each drought category",
x = "Drought Level",
y = "Acres Burned",
caption = "Source: National Drought Mitigation Center Data 2001-2024"
) +
theme(axis.text.x = element_text(angle = 15, hjust = 1))
SOLUTION:
The boxplots reveal no simple linear relationship between drought severity and acres burned:
Calculate the median acres burned for each drought level. Do more severe droughts correspond to more acres burned?
# Calculate median acres by drought level
drought_summary <- statewide %>%
group_by(drought_level) %>%
summarise(
n_years = n(),
median_acres = median(acres_burned),
mean_acres = mean(acres_burned),
max_acres = max(acres_burned)
) %>%
arrange(factor(drought_level, levels = c("Abnormally Dry", "Moderate", "Severe",
"Extreme", "Exceptional")))
print(drought_summary)
## # A tibble: 5 × 5
## drought_level n_years median_acres mean_acres max_acres
## <fct> <int> <dbl> <dbl> <dbl>
## 1 Abnormally Dry 10 6526. 11271. 25296.
## 2 Moderate 6 21820. 26986. 73235.
## 3 Severe 3 12902. 15219. 22388.
## 4 Extreme 3 11147. 22370. 48956.
## 5 Exceptional 2 14342. 14342. 15782.
SOLUTION:
The relationship is not straightforward:
Key insight: At the state level, years classified as having “moderate” drought show the highest typical (median) acres burned. However, this pattern can be misleading for two important reasons:
State averages hide local extremes A year labeled as “moderate drought” for the entire state may still include counties experiencing severe or extreme drought, which are the places where large fires are most likely to occur. When we average conditions across the whole state, we lose this local detail.
Annual data misses rapid climate shifts within a year Yearly drought categories cannot capture climate whiplash—for example, a wet winter that increases vegetation growth followed by a hot, dry summer that dries out that fuel and leads to fires. These short-term seasonal changes are important drivers of wildfire risk but are hidden when data are summarized by year.
Using the county data, examine the relationship between
drought_level and prop_area_burned. Create a
violin plot with overlaid points. What patterns emerge?
# Order drought levels
county_data <- county_data %>%
mutate(drought_level = factor(drought_level,
levels = c("Abnormally Dry", "Moderate", "Severe")))
ggplot(county_data, aes(x = drought_level, y = prop_area_burned, fill = drought_level)) +
geom_violin(alpha = 0.7, scale = "width") +
geom_jitter(width = 0.15, alpha = 0.6, size = 2, color = "#2C3E50") +
geom_boxplot(width = 0.15, fill = "white", alpha = 0.8, outlier.shape = NA) +
scale_fill_manual(values = c("Abnormally Dry" = "#FFEDA0",
"Moderate" = "#FEB24C",
"Severe" = "#F03B20"),
guide = "none") +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Proportion of County Area Burned by Drought Level",
subtitle = "Violin plots show distribution shape | Individual counties shown as points",
x = "Drought Level",
y = "Proportion of County Area Burned",
caption = "Source: National Drought Mitigation Center 2019-2023"
)
SOLUTION:
Patterns observed:
Create a scatter plot matrix (using
GGally::ggpairs() or faceting) showing relationships
between avg_temp_f, tree_loss_acres,
carbon_emissions, and acres_burned. Describe
what you observe.
# Select variables for pairs plot
pairs_data <- statewide %>%
select(avg_temp_f, tree_loss_acres, carbon_emissions, acres_burned) %>%
rename(
`Temp (°F)` = avg_temp_f,
`Tree Loss (acres)` = tree_loss_acres,
`Carbon (Mg)` = carbon_emissions,
`Acres Burned` = acres_burned
)
# Create pairs plot
ggpairs(pairs_data,
upper = list(continuous = wrap("cor", size = 4, color = "#2C3E50")),
lower = list(continuous = wrap("smooth", alpha = 0.5, color = "#D32F2F")),
diag = list(continuous = wrap("densityDiag", fill = "#FFA726", alpha = 0.6))) +
labs(title = "Relationships Between Key Wildfire Variables") +
theme_wildfire
SOLUTION:
Key observations from the pairs plot:
The climate data shows carbon emissions from tree cover loss. Create a stacked area chart showing cumulative carbon emissions over time. Has California’s tree cover loss-related carbon footprint been increasing?
Important Note: The carbon_emissions
variable represents CO₂ released from all tree cover
loss in California, not just wildfires. This includes emissions
from wildfires, logging, agricultural conversion, natural disturbances,
and development. While wildfires are a major driver, these figures
represent total forest carbon impact.
# Calculate cumulative emissions
statewide <- statewide %>%
arrange(year) %>%
mutate(cumulative_carbon = cumsum(carbon_emissions))
ggplot(statewide, aes(x = year, y = carbon_emissions)) +
geom_area(fill = "#D32F2F", alpha = 0.7) +
geom_line(color = "#8B0000", linewidth = 1) +
geom_point(color = "#8B0000", size = 2) +
scale_x_continuous(breaks = seq(2001, 2024, by = 2)) +
scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M")) +
labs(
title = "Annual Carbon Emissions from Tree Cover Loss in California",
subtitle = "Emissions from all tree cover loss (fires, logging, development, etc.) | Recent peaks in 2020-2021",
x = "Year",
y = "Carbon Emissions (Million Mg)",
caption = "Source: Global Forest Watch Data 2001-2024 | Includes all drivers of tree cover loss"
)
SOLUTION:
The chart shows highly variable carbon emissions with notable spikes:
Yes, California’s tree cover loss-related carbon footprint has been increasing, particularly since 2017. Note that these emissions include all sources of tree cover loss, with wildfires being a major but not exclusive contributor.
Calculate the total carbon emissions for the entire 24-year period. Express this in a relatable unit (hint: 1 megagram = 1 metric ton; average car emits ~4.6 tons/year). How many “car-years” of emissions is this?
# Total emissions
total_carbon_mg <- sum(statewide$carbon_emissions)
cat("Total carbon emissions (2001-2024):", format(total_carbon_mg, big.mark = ","), "Mg\n")
## Total carbon emissions (2001-2024): 1,695,100,582 Mg
# Convert to metric tons (1 Mg = 1 metric ton)
total_carbon_tons <- total_carbon_mg
# Average car emissions per year
car_emissions_per_year <- 4.6 # metric tons
# Calculate car-years equivalent
car_years <- total_carbon_tons / car_emissions_per_year
cat("Equivalent car-years of emissions:", format(round(car_years), big.mark = ","), "\n")
## Equivalent car-years of emissions: 368,500,127
# Additional context
us_cars <- 283000000 # approximate US registered vehicles
years_all_us_cars <- car_years / us_cars
cat("Equivalent to all US cars driving for:", round(years_all_us_cars, 2), "years\n")
## Equivalent to all US cars driving for: 1.3 years
SOLUTION:
In relatable terms: California’s tree cover loss over 24 years produced carbon emissions equivalent to all registered vehicles in the United States (~283 million cars) driving for about 1.3 years.
Important context: These emissions include all tree cover loss (wildfires, logging, agriculture, development, natural disturbances), not just wildfires. However, in California, wildfires are a major driver of tree cover loss, especially in recent years.
Using the county data, create a bubble plot with
median_income on x-axis, homes_damaged on
y-axis, and bubble size representing population. What
socioeconomic patterns do you observe?
# Create bubble plot
ggplot(county_data, aes(x = median_income, y = homes_damaged, size = population)) +
geom_point(alpha = 0.6, color = "#D32F2F") +
geom_smooth(method = "lm", se = FALSE, color = "#1976D2", linetype = "dashed",
aes(weight = NULL), show.legend = FALSE) +
geom_label_repel(data = county_data %>% arrange(desc(homes_damaged)) %>% head(5),
aes(label = county),
size = 3,
box.padding = 0.5,
max.overlaps = 15) +
scale_x_continuous(labels = dollar_format()) +
scale_y_continuous(labels = comma) +
scale_size_continuous(name = "Population",
range = c(2, 15),
labels = label_number(scale = 1e-6, suffix = "M")) +
labs(
title = "Income, Population, and Wildfire Home Damage",
subtitle = "Bubble size indicates county population | Top 5 damaged counties labeled",
x = "Median Household Income",
y = "Homes Damaged or Destroyed (>50% damage)",
caption = "Source: CALFIRE Data 2019-2023 and U.S. Census Bureau, ACS 5-Year Estimates (2023)"
) +
theme(legend.position = "right")
SOLUTION:
Socioeconomic patterns observed:
Key takeaway: These patterns suggest that wildfire risk is influenced more by geographic and environmental factors than by socioeconomic characteristics. Further analysis controlling for WUI exposure and regional location would be needed to confirm this relationship.
The Gini index measures income inequality (higher = more
unequal). Is there a relationship between gini_index and
prop_area_burned? Create an appropriate visualization and
calculate the correlation.
# Calculate correlation
cor_gini_burn <- cor(county_data$gini_index, county_data$prop_area_burned, use = "complete.obs")
cat("Correlation between Gini index and proportion burned:", round(cor_gini_burn, 3), "\n")
## Correlation between Gini index and proportion burned: 0.165
# Create scatter plot
ggplot(county_data, aes(x = gini_index, y = prop_area_burned)) +
geom_point(aes(color = region), size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE, color = "#2C3E50", linetype = "dashed") +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_color_brewer(palette = "Set3", name = "Region") +
labs(
title = "Income Inequality and Wildfire Impact by County",
subtitle = paste0("Correlation: r = ", round(cor_gini_burn, 3), " | Weak positive relationship"),
x = "Gini Index (Income Inequality)",
y = "Proportion of County Area Burned",
caption = "Source: CALFIRE Data 2019-2023"
) +
theme(legend.position = "bottom",
legend.text = element_text(size = 8)) +
guides(color = guide_legend(nrow = 2))
SOLUTION:
The correlation between Gini index and proportion burned is approximately 0.16 — a very weak positive relationship.
Interpretation:
The county dataset separates fires by cause:
fires_human, fires_natural, and
fires_other. Calculate the total fires in each category
statewide. What percentage of fires are human-caused?
# Calculate totals
fire_causes <- county_data %>%
summarise(
Human = sum(fires_human, na.rm = TRUE),
Natural = sum(fires_natural, na.rm = TRUE),
Other = sum(fires_other, na.rm = TRUE)
) %>%
pivot_longer(cols = everything(), names_to = "Cause", values_to = "Count")
# Add percentages
total_fires <- sum(fire_causes$Count)
fire_causes <- fire_causes %>%
mutate(Percentage = round(Count / total_fires * 100, 1))
print(fire_causes)
## # A tibble: 3 × 3
## Cause Count Percentage
## <chr> <dbl> <dbl>
## 1 Human 298 12.1
## 2 Natural 317 12.9
## 3 Other 1848 75
cat("\n--- Summary ---\n")
##
## --- Summary ---
cat("Total fires:", total_fires, "\n")
## Total fires: 2463
cat("Human-caused percentage:", fire_causes$Percentage[fire_causes$Cause == "Human"], "%\n")
## Human-caused percentage: 12.1 %
SOLUTION:
Only about 14% of fires are confirmed human-caused, while the majority fall into “other/unknown” category. This highlights the challenges of fire cause attribution.
Create a stacked bar chart showing the composition of fire causes for each of the 10 regions. Which region has the highest proportion of human-caused fires?
# Summarize by region
region_causes <- county_data %>%
group_by(region) %>%
summarise(
Human = sum(fires_human, na.rm = TRUE),
Natural = sum(fires_natural, na.rm = TRUE),
Other = sum(fires_other, na.rm = TRUE)
) %>%
pivot_longer(cols = c(Human, Natural, Other), names_to = "Cause", values_to = "Count") %>%
group_by(region) %>%
mutate(Proportion = Count / sum(Count))
# Calculate human proportion by region
human_prop <- region_causes %>%
filter(Cause == "Human") %>%
arrange(desc(Proportion))
cat("Regions ranked by proportion of human-caused fires:\n")
## Regions ranked by proportion of human-caused fires:
print(human_prop %>% select(region, Proportion) %>% mutate(Proportion = percent(Proportion, accuracy = 0.1)))
## # A tibble: 10 × 2
## # Groups: region [10]
## region Proportion
## <chr> <chr>
## 1 Northern San Joaquin Valley 20.2%
## 2 San Francisco Bay Area 18.4%
## 3 North Coast 17.8%
## 4 San Diego-Imperial 16.2%
## 5 Central Coast 14.8%
## 6 Inland Empire 13.4%
## 7 Superior California 13.3%
## 8 Los Angeles County 6.5%
## 9 Southern San Joaquin Valley 4.6%
## 10 Orange County 3.6%
# Create stacked bar chart
ggplot(region_causes, aes(x = reorder(region, -Count), y = Count, fill = Cause)) +
geom_col(position = "fill", width = 0.8) +
coord_flip() +
scale_fill_manual(values = c("Human" = "#E74C3C", "Natural" = "#3498DB", "Other" = "#95A5A6"),
name = "Fire Cause") +
scale_y_continuous(labels = percent_format()) +
labs(
title = "Fire Cause Composition by California Region",
subtitle = "Northern San Joaquin Valley has highest proportion of human-caused fires",
x = NULL,
y = "Proportion of Fires",
caption = "Source: CALFIRE Data 2019-2023"
) +
theme(legend.position = "bottom")
SOLUTION:
Northern San Joaquin Valley has the highest proportion of human-caused fires (approximately 20%), followed by the San Francisco Bay Area region (about 18.4%).
Key insights:
Create a Cleveland dot plot comparing human-caused vs. natural fires for the top 15 counties by total fires. Which counties have notably more natural (lightning) fires?
# Get top 15 counties by total fires
top15_fires <- county_data %>%
mutate(total_fires = fires_human + fires_natural + fires_other) %>%
arrange(desc(total_fires)) %>%
head(15) %>%
select(county, fires_human, fires_natural, total_fires) %>%
pivot_longer(cols = c(fires_human, fires_natural),
names_to = "Cause",
values_to = "Count") %>%
mutate(Cause = ifelse(Cause == "fires_human", "Human-Caused", "Natural (Lightning)"))
ggplot(top15_fires, aes(x = Count, y = reorder(county, total_fires), color = Cause)) +
geom_line(aes(group = county), color = "gray70", linewidth = 1) +
geom_point(size = 4) +
scale_color_manual(values = c("Human-Caused" = "#E74C3C", "Natural (Lightning)" = "#3498DB"),
name = "Fire Cause") +
labs(
title = "Human-Caused vs. Natural Fires by County",
subtitle = "Top 15 counties by total fire count | Connected dots show gap between causes",
x = "Number of Fires",
y = NULL,
caption = "Source: CALFIRE Data 2019-2023"
) +
theme(legend.position = "bottom")
SOLUTION:
Counties with notably more natural (lightning) fires:
These are all remote, mountainous, forested counties in Northern California where lightning storms are common and human activity is limited.
The article quotes an expert: “Wildfires will continue to
cause devastation as long as areas that were previously natural
vegetation are commercially developed.” Calculate the correlation
between acres_wui and
homes_damaged.
# Calculate correlation
cor_wui_acres_homes <- cor(county_data$acres_wui, county_data$homes_damaged, use = "complete.obs")
cat("Correlation between WUI acres and homes damaged:", round(cor_wui_acres_homes, 3), "\n")
## Correlation between WUI acres and homes damaged: 0.173
# Test significance
cor_test_wui <- cor.test(county_data$acres_wui, county_data$homes_damaged)
print(cor_test_wui)
##
## Pearson's product-moment correlation
##
## data: county_data$acres_wui and county_data$homes_damaged
## t = 1.3144, df = 56, p-value = 0.1941
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08928845 0.41284515
## sample estimates:
## cor
## 0.1729971
SOLUTION:
The correlation is approximately 0.17 — a very weak positive relationship.
This surprisingly weak correlation suggests that simply having more WUI area doesn’t directly predict home damage (>50% structural damage). Other factors matter more: fire behavior, evacuation success, firefighting resources, and building materials.
Create a scatter plot with prop_wui on x-axis
and homes_damaged per 1000 residents (calculate this) on
y-axis. Add a regression line. What does the slope tell us?
# Calculate damage rate per 1000 residents
county_data <- county_data %>%
mutate(damage_per_1000 = (homes_damaged / population) * 1000)
# Fit linear model
lm_fit <- lm(damage_per_1000 ~ prop_wui, data = county_data)
summary(lm_fit)
##
## Call:
## lm(formula = damage_per_1000 ~ prop_wui, data = county_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.282 -2.929 -2.774 -1.732 64.051
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7815 2.0029 1.389 0.17
## prop_wui 0.8745 6.3161 0.138 0.89
##
## Residual standard error: 9.416 on 56 degrees of freedom
## Multiple R-squared: 0.0003422, Adjusted R-squared: -0.01751
## F-statistic: 0.01917 on 1 and 56 DF, p-value: 0.8904
# Create scatter plot
ggplot(county_data, aes(x = prop_wui, y = damage_per_1000)) +
geom_point(aes(color = region), size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE, color = "#2C3E50", linetype = "dashed") +
geom_label_repel(data = county_data %>% arrange(desc(damage_per_1000)) %>% head(5),
aes(label = county),
size = 3,
max.overlaps = 15) +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_color_brewer(palette = "Set3", name = "Region") +
labs(
title = "WUI Exposure and Home Damage Rate by County",
subtitle = "Homes with >50% damage per 1,000 residents | Higher WUI → Higher damage rate",
x = "Proportion of County in WUI",
y = "Homes Damaged (>50%) per 1,000 Residents",
caption = "Source: CALFIRE Data 2019-2023"
) +
theme(legend.position = "bottom",
legend.text = element_text(size = 8)) +
guides(color = guide_legend(nrow = 2))
SOLUTION:
The regression slope is approximately 0.9, meaning:
Interpretation: For every 10 percentage point increase in WUI proportion, we expect approximately 0.09 more homes with >50% damage per 1,000 residents.
Key insight: This relationship is not statistically significant (p ≈ 0.89), and the model explains virtually none of the variation in per-capita wildfire damage.
Using mutate(), create a new variable
fires_total = fires_human + fires_natural + fires_other.
Then create a density ridge plot showing the distribution of total fires
by region.
# Create total fires variable
county_data <- county_data %>%
mutate(fires_total = fires_human + fires_natural + fires_other)
ggplot(county_data, aes(x = fires_total, y = reorder(region, fires_total, FUN = median),
fill = stat(x))) +
geom_density_ridges_gradient(scale = 2, rel_min_height = 0.01, gradient_lwd = 0.5) +
scale_fill_viridis(option = "inferno", name = "Total Fires") +
scale_x_continuous(expand = c(0, 0)) +
labs(
title = "Distribution of Total Fires by California Region",
subtitle = "Density ridges show how fire counts vary within each region",
x = "Total Number of Fires per County",
y = NULL,
caption = "Source: CALFIRE Data 2019-2023"
) +
theme(legend.position = "right")
Interpretation: Wider distributions indicate greater county-to-county variability in wildfire activity, while narrow distributions reflect more homogeneous fire patterns within regions.
SOLUTION:
Patterns from the ridge plot:
Limitations of this visualization:
When to use ridge plots vs. boxplots: - Ridge plots: Large samples, interest in distribution shape (bimodality, skewness) - Boxplots: Smaller samples, interest in comparing medians and outliers
The elderly population may be particularly vulnerable. Create
a scatter plot of prop_65_older
vs. homes_damaged. Is there evidence that counties with
older populations experience more home damage?
# Calculate correlation
cor_elderly <- cor(county_data$prop_65_older, county_data$homes_damaged, use = "complete.obs")
cat("Correlation between elderly proportion and homes damaged:", round(cor_elderly, 3), "\n")
## Correlation between elderly proportion and homes damaged: 0.088
ggplot(county_data, aes(x = prop_65_older, y = homes_damaged)) +
geom_point(aes(color = drought_level), size = 3, alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE, color = "#2C3E50", linetype = "dashed") +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_y_continuous(labels = comma) +
scale_color_manual(values = c("Abnormally Dry" = "#FFEDA0",
"Moderate" = "#FEB24C",
"Severe" = "#F03B20"),
name = "Drought Level") +
labs(
title = "Elderly Population and Wildfire Home Damage",
subtitle = paste0("Correlation: r = ", round(cor_elderly, 3), " | Weak positive relationship"),
x = "Proportion of Population Age 65+",
y = "Homes Damaged or Destroyed (>50% damage)",
caption = "Source: CALFIRE Data 2019-2023"
)
SOLUTION:
The correlation is approximately 0.09 — a weak positive relationship.
Interpretation:
Create a small multiples visualization (facet_wrap) showing the relationship between acres burned and homes damaged, faceted by drought level. Does drought severity change this relationship?
ggplot(county_data, aes(x = acres_burned, y = homes_damaged)) +
geom_point(aes(color = region), size = 2, alpha = 0.7) +
geom_smooth(method = "lm", se = FALSE, color = "#2C3E50", linewidth = 0.8) +
facet_wrap(~drought_level, scales = "free") +
scale_x_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
scale_y_continuous(labels = comma) +
scale_color_brewer(palette = "Set3", guide = "none") +
labs(
title = "Acres Burned vs. Homes Damaged by Drought Level",
subtitle = "Faceted by drought severity | Each panel shows different scale",
x = "Acres Burned (thousands)",
y = "Homes Damaged (>50% damage)",
caption = "Source: CALFIRE Data 2019-2023"
)
SOLUTION:
Observations by drought level:
Key insight: Drought severity appears to strengthen the relationship between fire size and home damage. This may be because during severe drought, fires burn more intensely and are harder to control, increasing structure damage per acre burned.
Before creating interactive plots, let us understand what makes them valuable and how to build them in R.
Static plots (like those created with ggplot2) are fixed images. Once rendered, users cannot explore the data further without modifying and re-running the code. Interactive visualizations, in contrast, allow users to explore data dynamically through direct manipulation.
Common interactive features include:
| Action | Purpose | Example Use Case |
|---|---|---|
| Hover | View exact values for individual data points | Identifying which year had 73,235 acres burned |
| Zoom | Focus on specific regions of interest | Examining a cluster of high-damage counties |
| Pan | Navigate across large datasets | Scrolling through a 24-year timeline |
| Filter | Show or hide data categories | Displaying only “Severe” drought years |
| Download | Save the current view as an image | Exporting a zoomed region for a report |
Interactive plots are particularly useful in the following scenarios:
Dense data: When datasets contain many observations, labels would clutter a static plot. Interactivity lets users reveal information on demand.
Precise values needed: When exact numbers matter (e.g., “How many homes were damaged in Butte County?”), hovering provides immediate answers without cluttering the display.
Exploratory analysis: When you need to examine different subsets or identify outliers, interaction is faster than repeatedly modifying code.
The plotly package provides two approaches for creating
interactive plots:
Approach 1: Convert existing ggplot2 objects
This approach leverages your existing ggplot2 knowledge. Create a
plot as usual, then wrap it with ggplotly():
# Create a standard ggplot
p <- ggplot(data, aes(x = year, y = value)) +
geom_point() +
geom_line()
# Convert to interactive
ggplotly(p)
Approach 2: Build directly with plot_ly()
This approach uses plotly’s native syntax, which offers more customization but requires learning new functions:
plot_ly(data, x = ~year, y = ~value, type = "scatter", mode = "lines+markers")
In this lab, we use Approach 1 because you already
know ggplot2 syntax. The ggplotly() function handles the
conversion automatically.
By default, ggplotly() displays the aesthetic mappings
(x, y, color, etc.) when hovering. For more informative tooltips, use
the text aesthetic:
# Add custom tooltip text using the text aesthetic
p <- ggplot(statewide, aes(x = year, y = acres_burned,
text = paste0("Year: ", year,
"<br>Acres: ", comma(acres_burned),
"<br>Fires: ", fire_count))) +
geom_point()
# Tell ggplotly to use only the custom text
ggplotly(p, tooltip = "text")
Key syntax notes:
paste0() to concatenate text without spaces<br> for line breaks in the tooltip (HTML
syntax)tooltip = "text" parameter tells plotly to use your
custom text instead of default aestheticsAfter converting to plotly, you can further customize with
layout():
ggplotly(p, tooltip = "text") %>%
layout(
hoverlabel = list(bgcolor = "white", font = list(size = 12)),
legend = list(orientation = "h", y = -0.15)
)
Now let us apply these concepts to create interactive wildfire visualizations.
Using plotly, convert your time series of acres
burned into an interactive plot where users can hover to see exact
values for each year. Include tooltips showing year, acres burned, and
fire count.
Instructions: Fill in the blanks
(_____) to complete the code. Use the concepts from the
introduction above.
# STEP 1: Create ggplot with custom tooltip text
# Hint: The text aesthetic creates hover information
p_timeseries <- ggplot(statewide, aes(x = year, y = acres_burned,
text = paste0("Year: ", _____,
"<br>Acres Burned: ", comma(round(_____)),
"<br>Fire Count: ", _____,
"<br>Drought: ", _____))) +
# STEP 2: Add geometries (provided)
geom_line(color = "#D32F2F", linewidth = 1) +
geom_point(aes(color = drought_level), size = 3) +
scale_color_manual(values = drought_colors, name = "Drought Level") +
scale_y_continuous(labels = comma) +
labs(
title = "Interactive: California Wildfire Severity (2001-2024)",
subtitle = "Hover over points for detailed information",
x = "Year",
y = "Acres Burned"
)
# STEP 3: Convert to plotly
# Hint: Use tooltip = "text" to display your custom text
ggplotly(p_timeseries, tooltip = "_____") %>%
layout(
hoverlabel = list(bgcolor = "white", font = list(size = 12)),
legend = list(orientation = "h", y = -0.15)
)
p_timeseries <- ggplot(statewide, aes(x = year, y = acres_burned,
text = paste0("Year: ", year,
"<br>Acres Burned: ", comma(round(acres_burned)),
"<br>Fire Count: ", fire_count,
"<br>Drought: ", drought_level))) +
geom_line(color = "#D32F2F", linewidth = 1) +
geom_point(aes(color = drought_level), size = 3) +
scale_color_manual(values = drought_colors, name = "Drought Level") +
scale_y_continuous(labels = comma) +
labs(
title = "Interactive: California Wildfire Severity (2001-2024)",
subtitle = "Hover over points for detailed information",
x = "Year",
y = "Acres Burned"
)
ggplotly(p_timeseries, tooltip = "text") %>%
layout(
hoverlabel = list(bgcolor = "white", font = list(size = 12)),
legend = list(orientation = "h", y = -0.15)
)
SOLUTION:
The interactive plot allows users to:
Create an interactive scatter plot using plotly showing all
58 counties with prop_wui vs. homes_damaged.
When users hover over a point, display the county name, region, and
population.
Instructions: Fill in the blanks to create the interactive scatter plot.
# STEP 1: Create ggplot with custom tooltip
# Hint: Include county, region, population, WUI proportion, and homes damaged in tooltip
p_scatter <- ggplot(county_data, aes(x = prop_wui, y = homes_damaged,
text = paste0("County: ", _____,
"<br>Region: ", _____,
"<br>Population: ", comma(_____),
"<br>WUI: ", percent(prop_wui, accuracy = 0.1),
"<br>Homes Damaged: ", comma(_____)))) +
# STEP 2: Add points with size mapped to population (provided)
geom_point(aes(color = region, size = population), alpha = 0.7) +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_y_continuous(labels = comma) +
scale_size_continuous(range = c(3, 12), guide = "none") +
scale_color_brewer(palette = "Set3", name = "Region") +
labs(
title = "Interactive: WUI Exposure and Home Damage by County",
subtitle = "Hover over points for county details | Bubble size = population",
x = "Proportion of County in WUI",
y = "Homes Damaged or Destroyed (>50% damage)"
)
# STEP 3: Convert to plotly with custom tooltip
# Hint: Use tooltip = "text" to show your custom hover text
ggplotly(p_scatter, tooltip = "_____") %>%
layout(
hoverlabel = list(bgcolor = "white", font = list(size = 11)),
legend = list(orientation = "v", x = 1.02, y = 0.5)
)
p_scatter <- ggplot(county_data, aes(x = prop_wui, y = homes_damaged,
text = paste0("County: ", county,
"<br>Region: ", region,
"<br>Population: ", comma(population),
"<br>WUI: ", percent(prop_wui, accuracy = 0.1),
"<br>Homes Damaged: ", comma(homes_damaged)))) +
geom_point(aes(color = region, size = population), alpha = 0.7) +
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_y_continuous(labels = comma) +
scale_size_continuous(range = c(3, 12), guide = "none") +
scale_color_brewer(palette = "Set3", name = "Region") +
labs(
title = "Interactive: WUI Exposure and Home Damage by County",
subtitle = "Hover over points for county details | Bubble size = population",
x = "Proportion of County in WUI",
y = "Homes Damaged or Destroyed (>50% damage)"
)
ggplotly(p_scatter, tooltip = "text") %>%
layout(
hoverlabel = list(bgcolor = "white", font = list(size = 11)),
legend = list(orientation = "v", x = 1.02, y = 0.5)
)
SOLUTION:
The interactive scatter plot reveals:
The TIME article claims wildfires are becoming “more frequent, larger, and happen in a longer fire season.” Based on your analysis of fire count and acres burned trends, does the data support the claim that fires are becoming “larger”? Cite specific statistics.
SOLUTION:
The data supports the claim that fires are becoming larger:
Evidence:
Mean acres burned increased 86%: From 12,141 acres (2001-2012) to 22,531 acres (2013-2024)
Extreme fire years are recent: The top 5 worst fire years by acres burned (2020, 2021, 2018, 2017) all occurred in the second half of the dataset
Maximum severity tripled: The worst year in 2001-2012 was 26,589 acres (2008); the worst in 2013-2024 was 73,235 acres (2020)
Correlation with temperature: The moderate positive correlation (r ≈ 0.49) between temperature and acres burned supports the climate connection
However, “more frequent” is less clearly supported:
Conclusion: The claim that fires are becoming “larger” is well-supported by the data. The claim about frequency is less definitive.
Create a summary visualization showing the 5 most important findings from this lab. You may choose the plot type(s) - be creative but ensure clarity.
# Create multiple summary plots
# Plot 1: Trend in fire severity
p1 <- ggplot(statewide, aes(x = year, y = acres_burned)) +
geom_col(fill = "#D32F2F", alpha = 0.8) +
geom_smooth(method = "loess", se = FALSE, color = "#1976D2", linewidth = 1.5) +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
labs(title = "1. Fire Severity Increasing",
subtitle = "Acres burned trending upward",
x = "Year", y = "Acres Burned (K)") +
theme(plot.title = element_text(size = 12))
# Plot 2: Temperature correlation
p2 <- ggplot(statewide, aes(x = avg_temp_f, y = acres_burned)) +
geom_point(size = 3, color = "#D32F2F", alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE, color = "#1976D2") +
scale_y_continuous(labels = label_number(scale = 1e-3, suffix = "K")) +
labs(title = "2. Temperature Drives Fire Size",
subtitle = paste0("Correlation r = ", round(cor(statewide$avg_temp_f, statewide$acres_burned), 2)),
x = "Avg Temperature (°F)", y = "Acres Burned (K)") +
theme(plot.title = element_text(size = 12))
# Plot 3: Top counties
top5_summary <- county_data %>%
arrange(desc(homes_damaged)) %>%
head(5)
p3 <- ggplot(top5_summary, aes(x = reorder(county, homes_damaged), y = homes_damaged)) +
geom_col(fill = "#8B0000", alpha = 0.8) +
geom_text(aes(label = comma(homes_damaged)), hjust = -0.1, size = 3.5) +
coord_flip() +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
labs(title = "3. Butte County Most Affected",
subtitle = "Top 5 by homes damaged (>50%)",
x = NULL, y = "Homes Damaged") +
theme(plot.title = element_text(size = 12))
# Plot 4: WUI exposure
p4 <- ggplot(county_data, aes(x = prop_wui, y = damage_per_1000)) +
geom_point(color = "#D32F2F", alpha = 0.6, size = 2) +
geom_smooth(method = "lm", se = TRUE, color = "#1976D2") +
scale_x_continuous(labels = percent_format()) +
labs(title = "4. WUI Increases Risk",
subtitle = "More WUI = More damage per capita",
x = "% County in WUI", y = "Homes/1000 Residents") +
theme(plot.title = element_text(size = 12))
# Plot 5: Carbon emissions
p5 <- ggplot(statewide, aes(x = year, y = carbon_emissions)) +
geom_area(fill = "#2E7D32", alpha = 0.7) +
geom_line(color = "#1B5E20", linewidth = 1) +
scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M")) +
labs(title = "5. Carbon Footprint Growing",
subtitle = "Tree loss emissions peaked in 2021",
x = "Year", y = "Carbon Emissions (M Mg)") +
theme(plot.title = element_text(size = 12))
# Combine with patchwork
combined <- (p1 | p2) / (p3 | p4 | p5) +
plot_annotation(
title = "Key Findings: California Wildfire Analysis (2001-2024)",
subtitle = "Five critical insights from the data",
caption = "Source: CALFIRE Data | STAT 1014 Lab",
theme = theme(
plot.title = element_text(size = 16, face = "bold"),
plot.subtitle = element_text(size = 12, color = "gray40")
)
)
print(combined)
SOLUTION:
The summary visualization highlights five key findings:
The article states the southwest US is experiencing “the driest 22-year period in the last 1,200 years.” Using our data, calculate the percentage of years (2001-2024) that experienced “Severe,” “Extreme,” or “Exceptional” drought.
# Count severe drought years
severe_drought <- statewide %>%
filter(drought_level %in% c("Severe", "Extreme", "Exceptional")) %>%
nrow()
total_years <- nrow(statewide)
pct_severe <- (severe_drought / total_years) * 100
cat("Years with Severe/Extreme/Exceptional drought:", severe_drought, "out of", total_years, "\n")
## Years with Severe/Extreme/Exceptional drought: 8 out of 24
cat("Percentage:", round(pct_severe, 1), "%\n")
## Percentage: 33.3 %
# Breakdown
drought_breakdown <- statewide %>%
group_by(drought_level) %>%
summarise(count = n()) %>%
mutate(percentage = round(count / sum(count) * 100, 1))
print(drought_breakdown)
## # A tibble: 5 × 3
## drought_level count percentage
## <fct> <int> <dbl>
## 1 Abnormally Dry 10 41.7
## 2 Moderate 6 25
## 3 Severe 3 12.5
## 4 Extreme 3 12.5
## 5 Exceptional 2 8.3
SOLUTION:
33.3% of years (8 out of 24) experienced Severe, Extreme, or Exceptional drought:
This aligns with the TIME article’s characterization of persistent drought conditions in the region.
San Francisco has 0 recorded wildfires and 0 homes damaged despite having over 836,000 residents. Using what you’ve learned about the 10-acre reporting threshold and urban density, explain why this makes statistical sense. What does this tell us about the nature of wildfire data?
SOLUTION:
San Francisco’s zero wildfire records make statistical sense for several interconnected reasons:
Physical/Geographic Factors:
Data Collection Factors:
What this teaches us about wildfire data:
Statistical principle: Always understand how data is collected before interpreting zeros or absences.
Create a final summary boxplot comparing carbon emissions across the 10 California regions. Which regions contribute most to tree cover loss-related emissions? Why might this be?
# Calculate regional carbon emissions
regional_carbon <- county_data %>%
group_by(region) %>%
summarise(
total_carbon = sum(carbon_emissions, na.rm = TRUE),
median_carbon = median(carbon_emissions, na.rm = TRUE),
n_counties = n()
) %>%
arrange(desc(total_carbon))
print(regional_carbon)
## # A tibble: 10 × 4
## region total_carbon median_carbon n_counties
## <chr> <int> <dbl> <int>
## 1 Superior California 458937304 14333338. 18
## 2 North Coast 89178852 13418654. 6
## 3 Southern San Joaquin Valley 41421704 1734585 5
## 4 Northern San Joaquin Valley 34461444 1915616. 10
## 5 Central Coast 15555659 250186. 6
## 6 San Francisco Bay Area 4496900 420294. 6
## 7 Inland Empire 2128732 1064366 2
## 8 Los Angeles County 1824944 912472 2
## 9 Orange County 123506 123506 1
## 10 San Diego-Imperial 122853 61426. 2
# Create boxplot
ggplot(county_data, aes(x = reorder(region, carbon_emissions, FUN = median),
y = carbon_emissions, fill = region)) +
geom_boxplot(alpha = 0.8, outlier.shape = 21, show.legend = FALSE) +
coord_flip() +
scale_y_continuous(labels = label_number(scale = 1e-6, suffix = "M")) +
scale_fill_brewer(palette = "Set3") +
labs(
title = "Carbon Emissions from Tree Cover Loss by California Region",
subtitle = "Superior California contributes most to emissions from tree cover loss",
x = NULL,
y = "Carbon Emissions (Million Mg)",
caption = "Source: Global Forest Watch Data 2019-2023 | Includes all drivers of tree cover loss"
)
SOLUTION:
Superior California contributes the most to tree cover loss-related carbon emissions, followed by the North Coast and Northern San Joaquin Valley.
Why these regions?
Important note: Carbon emissions include all tree cover loss (wildfires, logging, natural disturbances), not just fire-related loss.
Policy implication: Climate mitigation efforts should focus on fire prevention and sustainable forest management in heavily forested regions.
The TIME article concludes: “Living with wildfire is how humans manage risk.” Based on your analysis, propose 2-3 data-driven recommendations for how California residents and policymakers might better manage wildfire risk.
SOLUTION:
Based on the data analysis, here are three potential strategies worth investigating further:
1. Target WUI Development and Building Codes
2. Focus Resources on High-Impact Counties
3. Prepare for Climate Variability, Not Just Trends
Important caveat: These are hypotheses generated from observational data, not proven solutions. Each would require further study, stakeholder input, and careful consideration of unintended consequences before implementation.
Reflecting on the data and article, do you think there is a “lack of awareness about how much most people living in the west are living in areas prone to wildfire”? What data from this lab supports your position?
SOLUTION:
This is a genuinely difficult question, and the data supports multiple interpretations:
Evidence suggesting lack of awareness:
Widespread WUI exposure: Many California counties have 40-80% of their land classified as WUI, yet development continues
Income doesn’t predict caution: The weak correlation between income and fire damage suggests wealthy communities aren’t necessarily avoiding fire-prone areas
Populated areas at risk: Los Angeles County has over 928,000 acres of WUI with nearly 10 million residents
Historical amnesia: The most destructive fire years are recent (2017-2021), but development patterns haven’t dramatically shifted
Evidence suggesting awareness exists but other factors dominate:
People may be aware but accept risk for lifestyle benefits (views, space, cost)
Insurance market changes suggest increasing awareness
Housing affordability may force people into fire-prone areas regardless of awareness
What the data cannot tell us:
What people actually know about fire risk (we’d need surveys)
Whether people who move to WUI areas understand the risk vs. those who were already there
Conclusion: The continued development in WUI areas, combined with the socioeconomic data showing damage across all income levels, suggests either lack of awareness OR a collective underestimation of wildfire risk.
What was the most surprising finding from your analysis? What additional data would you want to collect to better understand California Wildfire risk?
SOLUTION:
Most surprising finding: (Student answers will vary, but instructor examples include:)
The “Moderate” drought level showing the highest median acres burned — counter to expectations that extreme drought = extreme fires
The weak correlation between WUI area and home damage — suggesting fire location and intensity matter more than simple exposure metrics
San Francisco’s complete absence from the data despite being in a fire-prone state
Additional data that would improve analysis:
This lab has demonstrated how statistical analysis can illuminate complex environmental issues. By examining California Wildfire data from 2001-2024, we found evidence supporting several claims from the TIME Magazine article:
Most importantly, we learned that “living with wildfire” requires understanding the data: where fires occur, what drives their severity, and how risk varies across communities. As climate change continues to affect fire patterns, data-driven decision-making will be essential for managing this ongoing challenge.
Explore More: Interactive Climate Simulations
To better understand how temperature shifts impact the broader world, explore these interactive resources:
| Resource | Description | Link |
|---|---|---|
| En-ROADS (Climate Interactive & MIT Sloan) | Test global policy and energy scenarios to see their impact on temperature | https://en-roads.climateinteractive.org |
| Climate Time Machine (NASA) | Visualize how temperature, sea ice, and CO₂ have changed over time | https://climate.nasa.gov/interactives/climate-time-machine |
| Sea Level Rise Viewer (Climate Central) | Explore how warming scenarios affect coastal flooding | https://coastal.climatecentral.org |
| 1.5°C vs 2°C Impacts (Carbon Brief) | Compare consequences of different warming levels across sectors | https://interactive.carbonbrief.org/impacts-climate-change-one-point-five-degrees-two-degrees |
These tools help contextualize why a “small” 1.38°F change in California connects to larger global climate dynamics.
| Source | Description | URL |
|---|---|---|
| CAL FIRE | Historical Wildfire Incidents by CA Census Tract | https://www.fire.ca.gov/incidents |
| NOAA NCEI | Historical Temperature by CA County | https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/ |
| The National Drought Mitigation Center | Historical Drought by CA County | https://droughtmonitor.unl.edu/DmData/DataDownload.aspx |
| CAL FIRE DINS | Wildfire Housing Damage in California | https://www.fire.ca.gov/what-we-do/fire-protection/damage-inspection |
| Global Forest Watch | Tree Cover Loss and Carbon Emissions | https://www.globalforestwatch.org/dashboards/country/USA/5 |
| U.S. Census Bureau | American Community Survey 5-Year Estimates | https://data.census.gov |
| CAL FIRE | Wildland-Urban Interface | https://data.ca.gov/dataset/wildland-urban-interface |
Note on Carbon Emissions Data: The carbon emissions and tree cover loss data from Global Forest Watch includes all drivers of tree cover loss, not just wildfires. This includes emissions from wildfires, logging and forestry operations, agricultural conversion, natural disturbances (storms, insects, flooding), and urban development. While wildfires are a major contributor in California, these figures represent total forest carbon impact.